AITopics | residual alignment

Collaborating Authors

residual alignment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix for " Residual Alignment: Uncovering the Mechanisms of Residual Networks " Anonymous Author(s) Affiliation Address email

Neural Information Processing SystemsFeb-16-2026, 16:14:07 GMT

We start by providing motivation for the unconstrained Jacobians problem introduced in the main text. We will continue our proof using contradiction. Figure 1: Fully-connected ResNet34 (Type 1 model) trained on MNIST.Figure 2: Fully-connected ResNet34 (Type 1 model) trained on FashionMNIST. Figure 10: Fully-connected ResNet34 (Type 1 model) trained on MNIST. Figure 24: Fully-connected ResNet34 (Type 1 model) trained on MNIST.

artificial intelligence, convolutional resnet34, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

Residual Alignment: Uncovering the Mechanisms of Residual Networks

Neural Information Processing SystemsDec-26-2025, 14:27:02 GMT

name change, residual alignment, uncovering, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)

Add feedback

Rare Text Semantics Were Always There in Your Diffusion Transformer

Kang, Seil, Han, Woojung, Ju, Dayun, Hwang, Seong Jae

arXiv.org Artificial IntelligenceOct-7-2025

Starting from flow- and diffusion-based transformers, Multi-modal Diffusion Transformers (MM-DiTs) have reshaped text-to-vision generation, gaining acclaim for exceptional visual fidelity. As these models advance, users continually push the boundary with imaginative or rare prompts, which advanced models still falter in generating, since their concepts are often too scarce to leave a strong imprint during pre-training. In this paper, we propose a simple yet effective intervention that surfaces rare semantics inside MM-DiTs without additional training steps, data, denoising-time optimization, or reliance on external modules (e.g., large language models). In particular, the joint-attention mechanism intrinsic to MM-DiT sequentially updates text embeddings alongside image embeddings throughout transformer blocks. We find that by mathematically expanding representational basins around text token embeddings via variance scale-up before the joint-attention blocks, rare semantics clearly emerge in MM-DiT's outputs. Furthermore, our results generalize effectively across text-to-vision tasks, including text-to-image, text-to-video, and text-driven image editing. Our work invites generative models to reveal the semantics that users intend, once hidden yet ready to surface.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.03886

Genre: Research Report > New Finding (1.00)

Industry: Media > Photography (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Whitening Spherical Gaussian Mixtures in the Large-Dimensional Regime

Boudjemaa, Mohammed Racim Moussa, Kalle, Alper, Mai, Xiaoyi, Goulart, José Henrique de Morais, Févotte, Cédric

arXiv.org Machine LearningSep-23-2025

Whitening is a classical technique in unsupervised learning that can facilitate estimation tasks by standardizing data. An important application is the estimation of latent variable models via the decomposition of tensors built from high-order moments. In particular, whitening orthogonalizes the means of a spherical Gaussian mixture model (GMM), thereby making the corresponding moment tensor orthogonally decomposable, hence easier to decompose. However, in the large-dimensional regime (LDR) where data are high-dimensional and scarce, the standard whitening matrix built from the sample covariance becomes ineffective because the latter is spectrally distorted. Consequently, whitened means of a spherical GMM are no longer orthogonal. Using random matrix theory, we derive exact limits for their dot products, which are generally nonzero in the LDR. As our main contribution, we then construct a corrected whitening matrix that restores asymptotic orthogonality, allowing for performance gains in spherical GMM estimation.

decomposition, estimator, matrix, (16 more...)

arXiv.org Machine Learning

2509.17636

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Residual Alignment: Uncovering the Mechanisms of Residual Networks

Neural Information Processing SystemsJan-19-2025, 19:58:09 GMT

The ResNet architecture has been widely adopted in deep learning due to its significant boost to performance through the use of simple skip connections, yet the underlying mechanisms leading to its success remain largely unknown. In this paper, we conduct a thorough empirical study of the ResNet architecture in classification tasks by linearizing its constituent residual blocks using Residual Jacobians and measuring their singular value decompositions. It also provably occurs in a novel mathematical model we propose. This phenomenon reveals a strong alignment between residual branches of a ResNet (RA2 4), imparting a highly rigid geometric structure to the intermediate representations as they progress *linearly* through the network (RA1) up to the final layer, where they undergo Neural Collapse.

residual alignment, residual jacobian, residual network, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback